Biological Nomenclatures: A Source of Lexical Knowledge and Ambiguity

نویسندگان

  • Olivia Tuason
  • Lifeng Chen
  • Hongfang Liu
  • Judith A. Blake
  • Carol Friedman
چکیده

There has been increased work in developing automated systems that involve natural language processing (NLP) to recognize and extract genomic information from the literature. Recognition and identification of biological entities is a critical step in this process. NLP systems generally rely on nomenclatures and ontological specifications as resources for determining the names of the entities, assigning semantic categories that are consistent with the corresponding ontology, and assignment of identifiers that map to well-defined entities within a particular nomenclature. Although nomenclatures and ontologies are valuable for text processing systems, they were developed to aid researchers and are heterogeneous in structure and semantics. A uniform resource that is automatically generated from diverse resources, and that is designed for NLP purposes would be a useful tool for the field, and would further database interoperability. This paper presents work towards this goal. We have automatically created lexical resources from four model organism nomenclature systems (mouse, fly, worm, and yeast), and have studied performance of the resources within an existing NLP system, GENIES. Using nomenclatures is not straightforward because issues concerning ambiguity, synonymy, and name variations are quite challenging. In this paper we focus mainly on ambiguity. We determined that the number of ambiguous gene names within the individual nomenclatures, across the four nomenclatures, and with general English ranged from 0%-10.18%, 1.187%-20.30%, and 0%-2.49% respectively. When actually processing text, we found the rate of ambiguous occurrences (not counting ambiguities stemming from English words) to range from 2.4%-32.9% depending on the organisms considered.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Gesture and its impact of resolving lexical ambiguity

The study aimed to shed light on the use of gesture in resolving lexical ambiguity employed by TEFL students. To this end, 60 intermediate Iranian learners, studying at Kish Way Language School in Iran were recruited. The participants were randomly put into two experimental groups and one control group. Both of the experimental groups received the same teaching approach, i.e. teaching homonyms ...

متن کامل

Gene name ambiguity of eukaryotic nomenclatures

MOTIVATION With more and more scientific literature published online, the effective management and reuse of this knowledge has become problematic. Natural language processing (NLP) may be a potential solution by extracting, structuring and organizing biomedical information in online literature in a timely manner. One essential task is to recognize and identify genomic entities in text. 'Recogni...

متن کامل

Lexical Ambiguity and The Role of Knowledge Representation in Lexicon Design

The traditional framework ['or ambiguity resolution employs only 'static' knowledge, expressed generally as selectional restrictions or domain specific constraints, and makes uo use of any specific knowledge manipulation mechanisms apart from the simple ability to match valences of structurally related words. In contraust, this paper suggests how a theory of lexical semantics making use of a kn...

متن کامل

Resolving lexical ambiguity in a deterministic parser

Lexical ambiguity and especially part-of-speech ambiguity is the source of much non-determinism in parsing. As a result, the resolution of lexical ambiguity presents deterministic parsing with a major test. If deterministic parsing is to be viable, it must be shown that lexical ambiguity can be resolved easily deterministically. In this paper, it is shown that Marcus's "diagnostics" can be hand...

متن کامل

Resolving Ambiguous Entity through Context Knowledge and Fuzzy Approach

Entity extraction is considered as a fundamental step in many text mining applications such as machine translation, text summarization and text categorization. However, the major challenging issue in extracting the entity from a sentence is the ambiguity problem, namely lexical ambiguity. While a human has a cognitive capability to resolve the meaning easily based on his/her knowledge, it is ve...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing

دوره   شماره 

صفحات  -

تاریخ انتشار 2004